6 research outputs found
Algorithms and Adaptivity Gaps for Stochastic k-TSP
Given a metric and a , the classic
\textsf{k-TSP} problem is to find a tour originating at the
of minimum length that visits at least nodes in . In this work,
motivated by applications where the input to an optimization problem is
uncertain, we study two stochastic versions of \textsf{k-TSP}.
In Stoch-Reward -TSP, originally defined by Ene-Nagarajan-Saket [ENS17],
each vertex in the given metric contains a stochastic reward .
The goal is to adaptively find a tour of minimum expected length that collects
at least reward ; here "adaptively" means our next decision may depend on
previous outcomes. Ene et al. give an -approximation adaptive
algorithm for this problem, and left open if there is an -approximation
algorithm. We totally resolve their open question and even give an
-approximation \emph{non-adaptive} algorithm for this problem.
We also introduce and obtain similar results for the Stoch-Cost -TSP
problem. In this problem each vertex has a stochastic cost , and the
goal is to visit and select at least vertices to minimize the expected
\emph{sum} of tour length and cost of selected vertices. This problem
generalizes the Price of Information framework [Singla18] from deterministic
probing costs to metric probing costs.
Our techniques are based on two crucial ideas: "repetitions" and "critical
scaling". We show using Freedman's and Jogdeo-Samuels' inequalities that for
our problems, if we truncate the random variables at an ideal threshold and
repeat, then their expected values form a good surrogate. Unfortunately, this
ideal threshold is adaptive as it depends on how far we are from achieving our
target , so we truncate at various different scales and identify a
"critical" scale.Comment: ITCS 202
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
Knowledge distillation is one of the primary methods of transferring
knowledge from large to small models. However, it requires massive
task-specific data, which may not be plausible in many real-world applications.
Data augmentation methods such as representation interpolation, token
replacement, or augmentation with models are applied to tackle this problem.
However, these data augmentation methods either potentially cause shifts in
decision boundaries (representation interpolation), are not expressive enough
(token replacement), or introduce too much computational overhead (augmentation
with models). To this end, we propose AugPro (Augmentation with Projection), an
effective and efficient data augmentation method for distillation. Our method
builds on top of representation interpolation augmentation methods to maintain
the diversity of expressions and converts the augmented data to tokens to avoid
shifting decision boundaries. It uses simple operations that come with little
computational overhead. The results on multiple GLUE tasks show that our
methods can improve distillation performance by a large margin at a low time
cost. Codes are available at
https://github.com/google-research/google-research/tree/master/augpro.Comment: 20 pages, 5 figures. Accepted by ICLR 202
ReSQueing Parallel and Private Stochastic Convex Optimization
We introduce a new tool for stochastic convex optimization (SCO): a
Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function
convolved with a (Gaussian) probability density. Combining ReSQue with recent
advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop
algorithms achieving state-of-the-art complexities for SCO in parallel and
private settings. For a SCO objective constrained to the unit ball in
, we obtain the following results (up to polylogarithmic
factors). We give a parallel algorithm obtaining optimization error
with gradient
oracle query depth and gradient queries in total, assuming access to a
bounded-variance stochastic gradient estimator. For , our algorithm matches the state-of-the-art oracle depth of
[BJLLS19] while maintaining the optimal total work of stochastic gradient
descent. Given samples of Lipschitz loss functions, prior works [BFTT19,
BFGT20, AFKT21, KLL21] established that if , -differential
privacy is attained at no asymptotic cost to the SCO utility. However, these
prior works all required a superlinear number of gradient queries. We close
this gap for sufficiently large , by
using ReSQue to design an algorithm with near-linear gradient query complexity
in this regime